虽然3D人类重建方法使用像素对齐的隐式功能(PIFU)开发快速,但我们观察到重建细节的质量仍然不令人满意。扁平的面部表面经常发生在基于PIFU的重建结果中。为此,我们提出了一个双重PIFU表示,以提高重建的面部细节的质量。具体地,我们利用两只MLP分别代表面部和人体的PIFU。专用于三维面重建的MLP可以提高网络容量,并降低面部细节重建的难度,如前一级PIFU表示。要解决拓扑错误,我们利用3个RGBD传感器捕获多视图RGBD数据作为网络的输入,稀疏,轻量级捕获设置。由于深度噪声严重影响重建结果,我们设计深度细化模块,以减少输入RGB图像的引导下的原始深度的噪声。我们还提出了一种自适应融合方案来熔化身体的预测占用场和面部的预测占用场,以消除其边界处的不连续性伪影。实验证明了我们在重建生动的面部细节和变形体形状方面的效果,并验证了其优于最先进的方法。
translated by 谷歌翻译
随着传感技术的进步,多元时间序列分类(MTSC)最近受到了相当大的关注。基于深度学习的MTSC技术主要依赖于卷积或经常性神经网络,主要涉及单时间序列的时间依赖性。结果,他们努力直接在多变量变量中表达成对依赖性。此外,基于图形神经网络(GNNS)的当前空间 - 时间建模(例如,图形分类)方法本质上是平的,并且不能以分层方式聚合集线器数据。为了解决这些限制,我们提出了一种基于新的图形汇集框架MTPOOL,以获得MTS的表现力全球表示。我们首先通过采用通过图形结构学习模块的相互作用来将MTS切片转换为曲线图,并通过时间卷积模块获得空间 - 时间图节点特征。为了获得全局图形级表示,我们设计了基于“编码器 - 解码器”的变形图池池模块,用于为群集分配创建自适应质心。然后我们将GNN和我们所提出的变分图层汇集层组合用于联合图表示学习和图形粗糙化,之后该图逐渐赋予一个节点。最后,可差异化的分类器将此粗糙的表示来获取最终预测的类。 10个基准数据集的实验表明MTPOOL优于MTSC任务中最先进的策略。
translated by 谷歌翻译
多变量时间序列预测,分析历史时序序列以预测未来趋势,可以有效地帮助决策。 MTS中变量之间的复杂关系,包括静态,动态,可预测和潜在的关系,使得可以挖掘MTS的更多功能。建模复杂关系不仅是表征潜在依赖性的必要条件以及建模时间依赖性,而且在MTS预测任务中也带来了极大的挑战。然而,现有方法主要关注模拟MTS变量之间的某些关系。在本文中,我们提出了一种新的端到端深度学习模型,通过异构图形神经网络(MTHETGNN)称为多变量时间序列预测。为了表征变量之间的复杂关系,在MTHETGNN中设计了一个关系嵌入模块,其中每个变量被视为图形节点,并且每种类型的边缘表示特定的静态或动态关系。同时,引入了时间嵌入模块的时间序列特征提取,其中涉及具有不同感知尺度的卷积神经网络(CNN)滤波器。最后,采用异质图形嵌入模块来处理由两个模块产生的复杂结构信息。来自现实世界的三个基准数据集用于评估所提出的MTHETGNN。综合实验表明,MTHETGNN在MTS预测任务中实现了最先进的结果。
translated by 谷歌翻译
目的:要开发CADIA,一种基于区域提案网络的监督深度学习模型,耦合具有针对计算机断层造影(CTA)颅内动脉瘤(IA)的假阳性减少模块,并评估我们的模型的性能到类似的检测网络。方法:在此回顾性研究中,我们评估了来自两种独立的疾病患者的两种单独的患者患者的囊性IA> = 2.5mm。实施了两步模型:用于初始动脉瘤检测的3D区域提案网络,以及3D DENSENETSFOR虚假阳性降低以及对可疑IA的进一步确定。还进行了自由响应接收器操作特征(FROC)曲线和患者级性能,在既定的假每体积(FPPV)时呈现出误报。 Fisher的确切测试用于与类似的可用模型进行比较。结果:0.25和1 FPPV的Cadia的敏感性分别为63.9%和77.5%。我们的模型的性能随着尺寸和位置而变化,最佳性能是在5-10毫米和前沟通动脉的含量,敏感性分别为95.8%和94%的敏感性。与0.25 FPPV的可用型号相比,我们的模型显示出统计学上更高的患者级精度,灵敏度和特异性。在1 FPPV阈值下,我们的模型显示出更好的准确性和特异性(P <= 0.001)和等效灵敏度。结论:CADIA在IA的检测任务中表现出可比网络。添加假阳性还原模块是改善IA检测模型的可行步骤。
translated by 谷歌翻译
在许多现实世界应用中,基于图表编辑距离(GED)等指标(GED)等图表之间计算相似性得分的能力很重要。计算精确的GED值通常是一个NP硬性问题,传统算法通常在准确性和效率之间实现不令人满意的权衡。最近,图形神经网络(GNNS)为该任务提供了数据驱动的解决方案,该解决方案更有效,同时保持小图中的预测准确性(每图约10个节点)相似性计算。现有的基于GNN的方法分别嵌入了两个图(缺乏低水平的横向互动)或用于整个图表对(冗余和耗时)的部署跨冲突相互作用,在图中的节点数量增加。在本文中,我们着重于大规模图的相似性计算,并提出了“嵌入式磨合匹配”框架cosimgnn,该框架首先嵌入和粗大图形具有自适应池操作,然后在污垢的图表上部署细粒度的相互作用,以便在污垢的图形上进行污垢的互动最终相似性得分。此外,我们创建了几个合成数据集,这些数据集为图形相似性计算提供了新的基准测试。已经进行了有关合成数据集和现实世界数据集的详细实验,并且Cosimgnn实现了最佳性能,而推理时间最多是以前的Etab-The-The-The-ART的1/3。
translated by 谷歌翻译
多变量时间序列(MTS)预测是许多领域的重要问题。准确的预测结果可以有效地帮助决策。迄今为止,已经提出了许多MTS预测方法并广泛应用。但是,这些方法假设单个变量的预测值受到所有其他变量的影响,这忽略了变量之间的因果关系。为了解决上述问题,我们提出了一种新的端到端深度学习模式,称为本文的神经格兰特因果关系图形神经网络(CAUGNN)。要在变量间的因果信息中表征,我们在模型中介绍了神经格子因果关系图。每个变量被视为图形节点,每个边缘表示变量之间的随意关系。另外,具有不同感知尺度的卷积神经网络(CNN)过滤器用于时间序列特征提取,其用于生成每个节点的特征。最后,采用图形神经网络(GNN)来解决MTS产生的图形结构的预测问题。来自现实世界的三个基准数据集用于评估提议的Caugnn。综合实验表明,该方法在MTS预测任务中实现了最先进的结果。
translated by 谷歌翻译
Non-line-of-sight (NLOS) imaging aims to reconstruct the three-dimensional hidden scenes from the data measured in the line-of-sight, which uses photon time-of-flight information encoded in light after multiple diffuse reflections. The under-sampled scanning data can facilitate fast imaging. However, the resulting reconstruction problem becomes a serious ill-posed inverse problem, the solution of which is of high possibility to be degraded due to noises and distortions. In this paper, we propose two novel NLOS reconstruction models based on curvature regularization, i.e., the object-domain curvature regularization model and the dual (i.e., signal and object)-domain curvature regularization model. Fast numerical optimization algorithms are developed relying on the alternating direction method of multipliers (ADMM) with the backtracking stepsize rule, which are further accelerated by GPU implementation. We evaluate the proposed algorithms on both synthetic and real datasets, which achieve state-of-the-art performance, especially in the compressed sensing setting. All our codes and data are available at https://github.com/Duanlab123/CurvNLOS.
translated by 谷歌翻译
In this paper, we target at the problem of learning a generalizable dynamic radiance field from monocular videos. Different from most existing NeRF methods that are based on multiple views, monocular videos only contain one view at each timestamp, thereby suffering from ambiguity along the view direction in estimating point features and scene flows. Previous studies such as DynNeRF disambiguate point features by positional encoding, which is not transferable and severely limits the generalization ability. As a result, these methods have to train one independent model for each scene and suffer from heavy computational costs when applying to increasing monocular videos in real-world applications. To address this, We propose MonoNeRF to simultaneously learn point features and scene flows with point trajectory and feature correspondence constraints across frames. More specifically, we learn an implicit velocity field to estimate point trajectory from temporal features with Neural ODE, which is followed by a flow-based feature aggregation module to obtain spatial features along the point trajectory. We jointly optimize temporal and spatial features by training the network in an end-to-end manner. Experiments show that our MonoNeRF is able to learn from multiple scenes and support new applications such as scene editing, unseen frame synthesis, and fast novel scene adaptation.
translated by 谷歌翻译
In this paper, we propose a large-scale language pre-training for text GENeration using dIffusion modEl, which is named GENIE. GENIE is a pre-training sequence-to-sequence text generation model which combines Transformer and diffusion. The diffusion model accepts the latent information from the encoder, which is used to guide the denoising of the current time step. After multiple such denoise iterations, the diffusion model can restore the Gaussian noise to the diverse output text which is controlled by the input text. Moreover, such architecture design also allows us to adopt large scale pre-training on the GENIE. We propose a novel pre-training method named continuous paragraph denoise based on the characteristics of the diffusion model. Extensive experiments on the XSum, CNN/DailyMail, and Gigaword benchmarks shows that GENIE can achieves comparable performance with various strong baselines, especially after pre-training, the generation quality of GENIE is greatly improved. We have also conduct a lot of experiments on the generation diversity and parameter impact of GENIE. The code for GENIE will be made publicly available.
translated by 谷歌翻译
Structured tabular data exist across nearly all fields. Reasoning task over these data aims to answer questions or determine the truthiness of hypothesis sentences by understanding the semantic meaning of a table. While previous works have devoted significant efforts to the tabular reasoning task, they always assume there are sufficient labeled data. However, constructing reasoning samples over tables (and related text) is labor-intensive, especially when the reasoning process is complex. When labeled data is insufficient, the performance of models will suffer an unendurable decline. In this paper, we propose a unified framework for unsupervised complex tabular reasoning (UCTR), which generates sufficient and diverse synthetic data with complex logic for tabular reasoning tasks, assuming no human-annotated data at all. We first utilize a random sampling strategy to collect diverse programs of different types and execute them on tables based on a "Program-Executor" module. To bridge the gap between the programs and natural language sentences, we design a powerful "NL-Generator" module to generate natural language sentences with complex logic from these programs. Since a table often occurs with its surrounding texts, we further propose novel "Table-to-Text" and "Text-to-Table" operators to handle joint table-text reasoning scenarios. This way, we can adequately exploit the unlabeled table resources to obtain a well-performed reasoning model under an unsupervised setting. Our experiments cover different tasks (question answering and fact verification) and different domains (general and specific), showing that our unsupervised methods can achieve at most 93% performance compared to supervised models. We also find that it can substantially boost the supervised performance in low-resourced domains as a data augmentation technique. Our code is available at https://github.com/leezythu/UCTR.
translated by 谷歌翻译